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ABSTRACT 

A new  method  for  increasing  confidence  in  software  based  on  the  premise 
that  competent  programmers  write  correct  or  "nearly"  correct  software  is  pre- 
sented. The  envisioned  system  takes  as  input  a program  and  a set  of  test 
data.  It  produces  and  executes  a set  of  perturbation  programs,  and  generates 
a list  indicating  which  perturbation  programs  are  indistinguishable  from  the 
original  program  (with  the  given  data).  A non-empty  list  indicates  that  the 
data  is  not  adequate,  that  there  exist  equivalent  programs  in  the  list,  or 
that  the  original  program  is  incorrect.  Ap  empty  list  indicates  tf  t the 
original  program  is  either  correct  or  "far"  from  correct.  While  the  set  of 
perturbation  programs  should  be  large  enough  to  include  many  commonly  made 
errors,  it  appears  that  there  is  a coupI lng  effect  suggesting  that  errors  not 
present  in  the  set  of  perturbation  programs  are  still  checked  by  this  method. 
Two  examples  of  the  use  of  the  method  are  given. 
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1.  INTRODUCTION 

It  is  well-known  that  the  design  and  construction  of  reliable  software  is  a 
difficult  ta3k.  The  purpose  of  this  paper  is  to  present  a new  method  that  may 
aid  in  constructing  reliable  software,  and  to  illustrate  the  application  of 
this  method  to  Fortran  subroutines. 

V. 

Current  approaches  to  reliable  software  fall  into  three  categories: 
constraining . proving,  and  testing . By  constraining  we  mean  those  methods 
that  place  restrictions  or  constraints  on  programmers  in  an  effort  to  force 
them  to  design  reliable  software.  The  whole  of  structured  programming  [1] 
with  its  restrictions  on  the  use  of  data  and  control  structures  falls  into 
this  category.  Also  included  in  this  category  are  the  attempts  to  create  a 
variety  of  methodologies,  such  as  those  of  Parnas  [2]  and  Wirth  [3],  that 
enable  programmers  to  avoid  certain  common  errors.  Although  these  methods 
have  met  with  encouraging  success  they  by  no  means  seem  to  solve  the  entire 
problem. 

Another  approach  is  to  rely  on  proving  that  a program  satisfies  certain 
formal  properties.  These  methods,  which  are  usually  referred  to  as  verifica- 
tion methods  [4,5],  hold  the  promise  of  correct  software.  However,  for  a 
variety  of  reasons  — principally  the  difficulty  of  specifying  software  in 
formal  terms  and  the  difficulty  of  the  proofs  [6]  — these  methods  ar?  not  yet 
practical  for  "real"  programs.  Indeed,  one  significant  problem  is  thit  while 
verification  techniques  may  work  on  "clean"  languages  such  as  Alphard  [7]  and 
Pascal  [8],  there  has  been  little  evidence  that  they  will  be  successfjl  on 
"dirty"  languages  such  as  Fortran.  Consequently,  verification  scheme  i do  not 
appear  applicable  to  the  large  bulk  of  existing  software,  and  their  payoff  may 
be  far  in  the  future. 

Program  testing  has  long  been  in  the  paradoxical  position  of  being  the 
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traditional  method  of  checking  real  software  and  yet,  until  recently  [9—14] , 
receiving  little  attention  in  the  literature.  Our  belief  is  that  program 
testing,  while  not  on  the  sound  mathematical  foundation  of  the  other  methods, 
can  be  used  to  develop  quite  powerful  and  useful  methods  for  constructing 
reliable  software. 

Program  testing  consists  of  determining  if  some  program  P works  correctly 
on  some  given  data  I.  The  basic  question  is  then  "if  P works  on  I,  is  P 
correct?"  The  answer  is,  of  course,  not  necessarily:  P may  work  on  I and 

yet  fail  on  all  other  input  data.  The  central  problem  in  program  testing  is 
to  find  a way  of  determining  whether  I is  an  adequate  test  of  P,  not  in  any 
formal  sense  but  rather  in  an  empirical  sense.  More  precisely,  testing  is  an 
Inductive  process  whereas  other  approaches,  such  as  verification,  are 
deductive  approaches. 

The  majority  of  work  in  program  testing  has  been  concerned  with  the  use  of 
path  analysis  and  symbolic  execution  to  generate  adequate  test  data  [11]. 

These  systems  generate  test  data  by  solving  a system  of  inequalities  con- 
structed by  symbolically  executing  all  control  paths  of  a program.  Although 
there  is  3ome  evidence  that  the  use  of  conditional  statements  in  languages 
such  as  Fortran  often  results  in  a linear  system  of  inequalities,  the  general 
problem  of  producing  test  data  for  all  execution  paths  is  an  unsolvabla  prob- 
lem [14]. 

All  of  the  methods  described  above  ignore  the  fact  that  programs  produced 
by  competent  programmers  are  usually  "almost"  correct.  Our  approach  relies  on 
this  observation  and  attempts  to  provide  a comprehensive  evaluation  of  both 
the  program  and  its  associated  test  data.  We  assume  that  if  a program  is  not 
correct,  then  it  is  a small  "perturbation"  of  the  correct  one.  The  basic  idea 
is  to  take  a program  and  its  associated  test  data  and  generate  all  the 
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possible  simple  perturbations  of  that  program,  and  run  each  one  with  the  given 
data.  If  all  the  perturbation  programs  give  incorrect  results,  it  is  very 
likely  that  the  original  program  is  correct.  If  some  of  the  perturbation 
programs  yield  correct  results,  the  data  is  inadequate,  there  is  an  error,  or 
the  perturbations  are  equivalent  programs. 

Superficially,  this  method  would  appear  capable  of  detecting  only  simple 
errors  such  as  typographical  mistakes  leading  to  undefined  variables.  How- 
ever, there  exists  a coupling  effect;  test  data  that  distinguishes  a] 1 simple 
perturbations  is  so  sensitive  that  it  also  implicitly  distinguishes  complex 
perturbations.  This  effect  is  due  to  the  observation  that  competent  program- 
mers design  programs  that  are  very  sensitive  to  even  mild  alterations. 

The  motivation  for  this  approach  comes  from  the  fault  detection  problem  in 
hardware  theory.  For  example,  if  C is  a circuit  that  forms  the  complement 
of  a 32-bit  number,  then  to  test  an  arbitrary  circuit  we  need  to  check 
inputs.  But  circuits,  like  programs,  are  highly  structured  objects,  and  if 
there  is  an  error  in  C,  it  is  very  likely  to  be  a single  fault  error.  By 
comparing  the  original  circuit  with  all  possible  perturbations  of  C invol- 
ving single  fault  errors,  it  is  possible  to  reduce  the  number  of  inputs  from 
2^  to  approximately  100.  The  basis  for  presuming  that  C is  correct  is  that 
the  probability  of  C containing  a double  fault  is  extremely  small. 

Section  2 describes  the  perturbation  method  in  more  detail  and  indicates 
the  type  of  simple  perturbations  we  are  considering.  Two  examples  are  given 
in  section  3.  The  first  example  illustrates  the  application  of  the  imthod  to 
a correct  program,  and  the  second  illustrates  its  application  to  a program 
containing  a complex  error.  This  second  example  demonstrates  the  coupling 
effect.  Section  4 describes  how  the  method  can  be  incorporated  into  a system 
and  be  used  by  both  programmers  and  managers  in  large  software  projec'.s. 
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2.  DESCRIPTION  OF  THE  METHOD 

Let  P be  a program  and  let  I be  a set  of  data.  We  wish  to  determine  if 
I Is  an  adequate  set  of  test  data.  (We  do  not  mean  that  P is  guaranteed  to 
be  correct,  but  rather  that  there  is  empirical  evidence  that  P is  correct  — 
remembering  that  P is  not  a "random"  program  but  has  been  constructed  by  a 
competent  programmer.)  Our  method  relies  the  construction  of  a set  of 
perturbation  programs  P1,...,Pk.  Initially,  each  ?i  can  be  thought  of  a3 
corresponding  to  one  of  all  possible  errors  that  could  be  made  in  constructing 
P.  We  will  see,  however,  that  a coupling  effect  suggests  that  this  is  an 
unnecessarily  conservative  view.  The  set  of  data  I is  adequate  if 

(i)  P works  correctly  on  I,  and 

(ii)  none  of  the  P^  works  correctly  on  I. 

Clearly,  if  I is  adequate  then  P is  correct  or  the  set  of  perturbation 
programs  was  improperly  constructed. 

There  are  two  reasons  to  believe  that  this  method  of  considering  only  sim- 
ple perturbations  should  work.  First,  there  is  empirical  evidence  that  most 
programming  errors  are  relatively  simple.  For  example,  Youngs  [15]  found  that 
15*  of  all  non-syntax  errors  were  merely  instances  of  the  use  of  the  wrong 
variable.  Most  of  the  errors  studied  by  Gannon  [16],  which  were  used  to  di- 
rect language  design,  were  also  relatively  simple.  Indeed,  there  are  numerous 
stories  about  large  software  systems  failing  because  of  incredibly  simple 
errors. 

Second,  we  have  evidence  that  checking  only  simple  perturbation  programs 
will  force  the  test  data  to  be  so  sensitive  that  even  more  complex  errors  will 
be  checked.  The  significance  of  this  coupling  effect  is  that  the  potential 
exists  for  catching  a wide  range  of  errors  while  only  testing  for  simple  ones. 

An  example  of  this  effect  is  given  below  in  connection  with  Hoare's  FIND 
program  [17]. 
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Errors  of  the  type  found  by  Youngs  and  Gannon  might  be  called  temlnal 
errors.  This  type  of  error  provides  a starting  point  for  the  construction  of 
the  programs  by  making  perturbations  to  P.  Let  G be  a context-free 

grammar  for  the  language  L.  For  program  P in  L define  term(Q)  to  be 
all  programs  Q in  L such  that  the  parse  tree  of  Q differs  from  the  parse 
tree  of  P only  at  the  leaves.  For  example,  if  S is  a Fortran  IF  state- 
sent,  then  term(S)  would  contain  only  IF  statements.  If  S is  the  state- 
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IF  (A  .EQ.  C)  B = 2 
then  the  statements 

IF  (A  .NE.  C)  B = 1 
IF  (A  .EQ.  B)  C = 2 
are  members  of  term(s) . 

If  some  P^  differs  from  P by  at  most  k terminal  symbols,  then  P^  is 
called  a k-terminal  perturbation  of  P.  The  set  of  k-terminal  perturbations 
of  P is  denoted  by  term^Cp). 

There  are  simple  errors  that  are  not  k-terminal.  For  example,  an  error  in 
the  parentheses  structure  of  a Fortran  arithmetic  expression  is  not 
k-terminal.  Errors  of  this  type,  however,  are  caught  during  compilation  since 
P is  no  longer  in  L.  Permuting  the  order  of  program  statements  and  failure 
to  Initialize  a variable  are  other  examples  of  errors  that  are  not  k-terminal. 
A more  sophisticated  system  than  the  one  described  here  could  examine  the 
structure  of  a program  and  produce  perturbations  that  reflect  the  permutation 
of  statements,  variations  of  loop  boundaries,  and  changes  in  the  flow  of  con- 
trol . 
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3.  TWO  EXAMPLES 

In  this  section,  two  experiments  are  described  to  illustrate  the  perturba- 
tion method. 


.>■  fl 

3-1  MAX 


The  first  example  involves  the  MAX  program  analyzed  by  Naur  [18].  The 
problem  is  to  set  R to  the  index  of  the  first  occurrence  of  the  maximum 
element  in  the  array  A(1 ) , . . . ,A(N) . The  following  Fortran  subroutine  per- 
forms this  operation. 

SUBROUTINE  MAX( A , N,  R) 

INTEGER  A(N) , I,  N,  R 

1 R = 1 

2 DO  3 I = 2,  N,  1 

3 IF  ( A ( I ) .GT.  A(R ) ) R = I 
RETURN 

END 

For  this  subroutine,  the  following  three  classes  of  1-terminal  perturba- 
tions were  considered. 

Relational  operator:  Replace  the  relational  operator  .GT.  in  statement  3 by 

all  the  alternative^  selected  from  the  set  of  relational  operators  {.EQ., 
•NE. , . LE. , . LT. , .GE. , .GT. } . 

Constants : Replace  the  three  occurrences  of  constants  by  members  of  the  set 

{0,  1,  2}. 


•1 


Variables:  Replace  the  seven  occurrences  of  variables  by  members  of  the  set 

{R,  I,  N,  A( I ) , A ( R ) } . 

Applying  these  perturbations  to  MAX  yields  39  perturbation  programs,  P1 
through  P39,  whose  characteristics  are  summarized  in  table  1.  Fourte ?n  of 
these  programs  can  be  eliminated  from  further  consideration  by  inspection. 

Four  programs  do  not  compile,  seven  lead  to  subscript  errors  due  raainlv  to  the 
use  of  uninitialized  variables,  and  three  have  ill-formed  loops.  The  Initial 
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Table  1 

The  MAX  Experiment 


program 

perturbation  line  name 


data  1 data  2 data  2 
(1,2,3)  (1,3,2)  (3,1,2) 


1 , 2&3  data  M 

(2,2,1) 


.GT.  -> 

.EQ. 

3 

.GT.  -> 

.NE. 

3 

.GT.  -> 

.LT. 

3 

.GT.  -> 

.LE. 

3 

.GT.  -> 

.GE. 

3 

1 -> 

0 

1 

1 -> 

2 

1 

2 -> 

0 

* 2 

2 -> 

1 

2 

1 -> 

0 

• 2 

1 -> 

2 

2 

R -> 

I 

• 1 

R -> 

N 

• 1 

R -> 

A(  I ) 

• 1 

R -> 

A(R) 

* 1 

I -> 

R 

• 2 

1 -> 

N 

• 2 

I -> 

A(I ) 

• 2 

1 -> 

A(R) 

» 2 

N -> 

I 

• 2 

N -> 

R 

2 

N -> 

A ( I ) 

• 2 

N -> 

A(R) 

2 

A( I ) -> 

I 

3 

A ( I ) -> 

R 

3 

A( I ) -> 

N 

3 

A( I ) -> 

A(R) 

3 

A(R)  -> 

I 

3 

A(R)  -> 

R 

3 

A(R)  -> 

N 

3 

A(R)  -> 

A ( I ) 

3 

R -> 

I 

• 3 

R -> 

N 

* 3 

R -> 

A ( I ) 

3 

R -> 

A(R) 

3 

I -> 

R 

3 

I -> 

N 

3 

I -> 

A ( I ) 

3 

I -> 

A(R) 

3 

P7 

P8 

P9 

p1° 

P11 

p12 

P13 

P14 

P15 

P16 

P17 

P18 

p19 

p20 

p21 

\22 

p23 

„24 

P25 

P26 

p27 

P28 

p29 

p3° 

p31 

P32 

P33 

p34 

P35 

P36 

P37 

P38 

*39 


M 


M 

M 

M 

M 


M 

M 


M 


M 

M 

M 


M 

M 

M 

M 

M 


M 

M 

M 

M 

M 

M 


M 


M 


M 

M 


M 

M 

M 

M 

M 

M 

M 

M 

M 


M 

M 

M 

M 

M 

M 


M 

M 


M 

M 


M 

M 


M 

M 

M 

M 

M 

M 

M 


M 

M 

M 

M 

M 

M 


1 ,2,3^ 


M 


* indicates  that  the  perturbation  was  eliminated  before  execution. 
M indicates  that  the  execution  results  are  the  same  as  for  MAX. 
x ->  y represents  substituting  y for  x in  the  indicated  line. 
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test  data  consisted  of  three  cases;  a three-element  vector  in  which  the  maxi- 
mum is  varied  over  the  three  positions. 

The  initial  set  of  test  data  eliminates  all  but  perturbations  P^,  Pg,  ?2n, 
and  ^26'  That  is,  these  perturbation  programs  gave  the  same  results  as  the 
original  version  of  MAX.  The  presence  of  indicated  that  the  inadequacy 

of  the  test  data  might  be  due  to  the  absence  of  repeated  array  elements. 

Hence  a fourth  case  was  added  to  the  test  data  to  resolve  this  inadequacy. 

The  results  of  this  test  are  given  in  the  rightmost  column  of  table  1,  which 
shows  that  all  perturbations  except  Pg  have  been  eliminated.  Pg  as  formed 
by  a change  of  constants  — the  DO  loop  is  started  at  1 instead  of  2.  Al- 
though this  change  results  in  a slightly  longer  execution  time,  close  examina- 
tion reveals  that  Pg  and  MAX  are  functionally  equivalent  programs.  There  is 
no  test  data  that  can  be  used  to  distinguish  these  two  programs.  Consequent- 
ly, MAX  has  passed  the  1 -terminal  analysis. 

3.2  FIND 

The  second  example,  which  is  described  in  less  detail,  involves  Hoare's 
FIND  program  [17].  FIND  has  two  input  parameters:  an  integer  array  A and  an 
array  index  F.  FIND  is  to  transform  A -such  that  A(I)  <.  A(F)  for  all 
I < F and  A ( I ) >_  A(F)  for  all  I > F.  This  problem  is  of  particular  inter- 
est because  a subtle  2-terminal  perturbation  of  FIND,  called  BUGGYFIND,  has 
been  extensively  analyzed  by  SELECT  [19],  a system  that  generates  test  data. 
The  subtle  change  is  as  follows.  In  FIND  the  elements  of  A are  interchanged 
depending  on  a conditional  of  the  form 
X .LE.  A ( F ) .AND.  A(F)  .LE.  Y 

Since  A(F)  may  itself  be  exchanged,  the  effect  of  this  test  i3  preserved  by 
setting  a temporary  variable  R = A(F)  and  using  the  conditional 
X .LE.  R .AND.  R .LE.  Y 


I 

! 
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In  BUGGYFIND,  the  temporary  variable  R is  not  used  and  the  first  form  of  the 
conditional  is  used  to  determine  whether  the  elements  of  A are  exchanged. 

The  SELECT  system  derived  the  test  case  A = (3,  2,  0,  1)  and  F = 3,  on 
which  BUGGYFIND  fails.  The  authors  of  SELECT  observed  that  BUGGYFIND  fails  on 
only  2 of  the  24  permutations  of  (0,  1,  2,  3)  indicating  that  the  error  is 
very  subtle.  (We  found  that  BUGGYFIND  failed  only  on  one  case;  namely,  A = 
(3,  2,  0,  1)  and  F = 3). 

Taking  BUGGYFIND  as  the  original  program,  consider  the  following  1 -terminal 
perturbations. 


BF1 : 

the 

conditional 

is 

X .LE. 

A(F)  .AND.  R 

.LE.  Y. 

BF2: 

the 

conditional 

is 

X .LE. 

R .AND.  A(F) 

.LE.  Y. 

The  results  of  running  these  programs  on  all  permutations  of  (0,  1,  2,  3)  are 
listed  in  table  2.  Observe  that  in  all  cases  BUGGYFIND,  BF1,  and  BF.’  produce 
identical  results.  Consequently,  since  BF1  and  BF2  cannot  be  eliminated, 
BUGGYFIND  must  be  viewed  with  some  suoricion.  The  important  point  of  this 
example  is  that  a 2-terminal  error  was  detected  using  only  1-terminal  pertur- 
bations. This  illustrates  the  coupling  effect,  which  indicates  that  simple 
perturbations  are  capable  of  detecting  more  complex  errors. 

4.  IMPLEMENTATION  CONSIDERATIONS 

We  envision  a system  in  which  there  is  some  offline  programmer-syrtem  in- 
teraction. The  programmer  submits  a "well-analyzed"  program  and  test  data  to 
the  system.  The  system  returns  a list  of  perturbation  programs  that  are  in- 
distinguishable from  the  original  progam  using  the  given  data.  If  the  list  is 
long,  the  programmer  may  wish  to  simply  enrich  the  data  and  try  agalr  . If  the 
list  is  3hort  --  on  the  order  of  10  programs  — the  programmer  may  aralyze  the 


perturbations  by  hand  to  determine  whether  they  are  equivalent  programs  or 


Hanson,  Lipton,  and  Sayward 


12 


Table  2 

The  Find  Experiment,  F = 3 


A FIND  BUGGYFIND  BF1  BF2 


0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

3 

2 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

2 

1 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

2 

3 

1 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

3 

1 

2 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

3 

2 

1 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

1 

0 

2 

3 

1 

0 

2 

3 

1 

0 

2 

3 

1 

0 

2 

3 

1 

0 

2 

3 

1 

0 

3 

2 

1 

0 

2 

3 

1 

0 

2 

3 

1 

0 

2 

3 

1 

0 

2 

3 

1 

2 

0 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

1 

2 

3 

0 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

1 

3 

0 

2 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

1 

3 

2 

0 

1 

0 

2 

3 

1 

0 

2 

3 

1 

0 

2 

3 

1 

0 

2 

3 

2 

0 

1 

3 

1 

0 

2 

3 

1 

0 

2 

3 

1 

0 

2 

3 

1 

0 

2 

3 

2 

0 

3 

1 

1 

0 

2 

3 

1 

0 

2 

3 

1 

0 

2 

3 

1 

0 

2 

3 

2 

1 

0 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

2 

1 

3 

0 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

2 

3 

0 

1 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

2 

3 

1 

0 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

3 

0 

1 

2 

1 

0 

2 

3 

1 

0 

2 

3 

1 

0 

2 

3 

1 

0 

2 

3 

3 

0 

2 

1 

1 

0 

2 

3 

1 

0 

2 

3 

1 

0 

2 

3 

1 

0 

2 

3 

3 

1 

0 

2 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

3 

1 

2 

0 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

3 

2 

0 

1 

0 

1 

2 

3 

0 

2 

1 

3 

0 

2 

1 

3 

0 

2 

1 

3 

3 

2 

1 

0 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

0 

1 

2 

3 

there  is  reason  to  suspect  the  original  program.  The  key  point  is  that  all 
perturbation  programs  thought  to  be  equivalent  to  the  original  program  must  be 
"signed  off"  by  the  programmer  before  the  program  is  accepted.  Thus  by  having 
the  system  generate  reports  indicating  who  has  signed  off  various  pro  rams, 
subsequent  failure  of  a program  can  be  readily  attributed  to  the  proper 
source . 

An  apparent  drawback  to  this  method  is  the  potentially  explosive  number  of 


1 1 


A 

i 


perturbation  programs,  even  at  the  1-terminal  level.  The  brute- force  approach 
leads  to  a large  number  of  programs  to  be  compiled  and  executed.  There  seems 
to  be  little  that  can  be  done  to  reduce  the  execution  time  necessary  lo  run 
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all  interesting  perturbations.  However,  considering  the  enormous  amount  of 
time  currently  spent  on  ad  hoc  testing  and  debugging  techniques,  the  amount  of 
CPU  time  required  does  not  seem  excessive. 

For  example,  the  Fortran  version  of  FIND  consists  of  about  30  statements. 
This  size  is  typical  for  a module  in  a well-structured  system.  Asssumlng  that 
it  requires  0.01  seconds  to  run  a test  case  for  a four-element  array,  running 
all  2*1  permutations  described  above  requires  0.24  seconds.  There  are  approxi- 
mately 1000  1-terminal  perturbations  in  the  Fortran  version  of  FIND.  Thus  a 
complete  analysis  of  1-terminal  perturbations  requires  less  than  5 minutes  of 
CPU  time. 

There  are  several  ways  to  reduce  the  number  of  the  programs  to  be  executed. 
A significant  number  of  perturbations  will  be  rejected  by  the  compiler.  The 
techniques  of  some  current  program  validation  systems  can  be  applied  to  fur- 
ther reduce  the  number  of  programs.  For  example,  the  DAVE  system  [20]  can  be 
used  to  eliminate  programs  having  uninitialized  variables.  Presumably,  the 
competent  programmer  rarely  makes  such  errors.  The  set  of  perturbation  pro- 
grams may  be  further  reduced  by  using  a symbolic  execution  system  to  eliminate 
programs  containing  non-executable  statements  or  unreachable  paths. 

A more  serious  problem  arises  when  a perturbation  program  does  not  halt. 

To  handle  this,  we  assume  that  the  original  program  completes  in  at  least  t 
seconds.  The  system  then  stops  any  perturbation  program  that  runs  longer  than 
ct  seconds,  where  c is  some  constant  supplied  by  the  programmer.  These 
non-halting  programs  have  not  been  eliminated:  rather,  they  are  reported  to 
the  programmer  who  must  decide  to  either  eliminate  them  by  hand  or  increase 


the  value  of  c and  try  again. 
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5.  CONCLUSIONS 

The  system  described  in  this  paper  represents  a practical  approach  to  pro- 
gram testing.  The  method  rests  on  the  validity  of  the  coupling  effect,  that 
is,  that  simple  perturbations  are  sufficient  to  catch  more  complex  errors  than 
actually  tested  by  the  simple  perturbations.  Although  the  coupling  effect  is 
mathematically  unprovable,  initial  experience  — as  demonstrated  by  HUGGYFIND 
— suggests  its  validity.  The  perturbation  approach  also  offers  the  possibil- 
ity of  testing  existing  programs,  while  most  of  the  other  approaches  to  soft- 
ware validation,  such  as  program  verification,  the  design  of  new  languages,  or 
specialized  methodologies,  are  applicable  only  to  new  programs  or  programs 
written  in  a specific  language. 

In  addition,  by  the  appropriate  choice  of  perturbations,  the  method  in- 
cludes several  other  testing  schemes  as  subcases.  For  example,  given  a pro- 
gram and  its  test  data  we  can  determine  if  a given  statement  is  executed  by 
simply  changing  that  statement  to  one  that  divides  by  zero.  If  the  program 
docs  not  halt  due  to  a divlde-by-zero  error,  the  statement  is  not  executed 
using  the  given  data.  Similar  techniques  can  be  used  to  determine  if  certain 
control  paths  are  traversed  or  if  a given  variable  is  referenced  befcre  Its 
definition. 

The  system  described  here  is  also  useful  as  a management  tool.  The  soft- 
ware manager  can  use  the  reports  generated  by  such  a system  to  monitor  and 
control  the  development  of  the  modules  in  a large  project.  As  mentioned 
above,  a module  P might  be  considered  acceptable  if  its  associated  test  data 
distinguished  It  from  all  Its  perturbations  P^(  or  the  event  that  some 
small  number  of  P^  were  indistinguishable  from  P,  the  programmer  responsi- 
ble for  P certified  that  those  P^  were  equivalent  programs.  If  P subse- 
quently failed,  the  perturbations  of  P that  were  thought  to  be  equivalent 
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could  provide  an  Indication  of  why  P failed.  The  manager  might  also  use 
this  information  in  evaluating  programmer  performance. 

Recent  research  in  programming  language  design  [16]  and  programming  method- 
ology [9]  indicates  that  better  languages  and  specialized  methodologies  can 
significantly  improve  software  reliability.  Empirical  data  obtained  by  test 
ing  programs  using  the  perturbation  scheme  may  also  offer  some  insight  into 
what  specific  kinds  of  language  features  and  methodologies  actually  reduce 
errors.  For  example,  a high  incidence  of  a particular  error  that  causes  cer- 
tain perturbations  to  be  consistently  indistinguishable  from  the  original 
program  would  suggest  that  the  language  be  changed  to  prevent  that  error.  The 
undistingui3hable  perturbations  would  point  to  the  undesirable  language  fea- 
ture. In  addition,  the  method  might  show  the  presence  of  deficiencies  in 
module  specifications,  which,  in  turn,  would  suggest  deficiencies  in  the  pro- 
gramming methodology  used.  Again,  the  perturbation  programs  would  help  deter- 
mine the  nature  of  the  deficiency. 

Finally,  as  with  any  system  that  collects  empirical  data,  the  possibility 
exists  for  self-improvement.  Initially,  the  perturbations  of  a program  art 
produced  without  any  real  knowledge  of  which  ones  will  be  helpful  in  correc- 
ting errors.  Continued  use  of  the  system,  however,  would  provide  data  on 
which  types  of  perturbations  are  most  useful.  Such  data  could  then  ae  used  1 
"tune"  the  system  for  better  performance. 
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